Genome Medicine
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
PurposeCopy number variants (CNVs) are a major contributor to rare genetic diseases, but their detection and interpretation from short-read genome sequencing (srGS) data remain challenging, especially at scale. Large amounts of existing srGS data remain under-analyzed for clinically relevant CNVs. MethodsDuring a collaborative Hackathon, we developed and applied scalable CNV analysis workflows to srGS data from three unsolved, exome-negative, rare disease cohorts: Primary Immunodeficiency (N = ...
Show abstract
BackgroundDespite widespread implementation of exome and genome sequencing, a substantial proportion of rare disease patients remain undiagnosed due to inherent limitations in detecting structural, repetitive, and regulatory variants. MethodsWe applied long-read sequencing (LRS) to 40 individuals from 33 previously undiagnosed Korean families. De novo assemblies were integrated into a graph-based pangenome workflow, enabling sensitive detection of single-nucleotide, structural, and tandem-repea...
Show abstract
Severe combined immunodeficiency (SCID) is a heterogeneous, recessive disorder, associated with the onset of severe, recurrent infections in the first few months of life. SCID is fatal if left untreated, but outcomes can be significantly improved by prompt diagnosis and treatment, particularly prior to onset of infection. Consequently, SCID is already included in many newborn screening programmes around the world, as well as multiple international genomic newborn screening (gNBS) research progra...
Show abstract
Despite whole genome sequencing (WGS), why do many single gene disorder cases remain unsolved, impeding diagnosis and preventative care for people whose disease-causing variants escape detection? Early WGS data analytic steps prioritize protein-coding sequences. To simultaneously prioritise variants in non-coding regions rich in transcribed and critical regulatory sequences, we developed GROFFFY, an analytic tool which integrates coordinates for regions with experimental evidence of functionalit...
Show abstract
Synonymous single nucleotide variants (sSNVs), traditionally seen as neutral, are now recognized for their biological impact. To assess their relevance, we developed SyMetrics, a framework that integrates predictors of splicing, RNA stability, evolutionary conservation, codon usage, synonymous variation effects, sequence properties, and allele frequency. We analyzed all possible sSNVs across the human genome, and our machine-learning model achieved 97% accuracy in distinguishing deleterious from...
Show abstract
Copy number variants (CNVs) are significant contributors to the pathogenicity of rare genetic diseases and with new innovative methods can now reliably be identified from exome sequencing. Challenges still remain in accurate classification of CNV pathogenicity. CNV calling using GATK-gCNV was performed on exomes from a cohort of 6,633 families (15,759 individuals) with heterogeneous phenotypes and variable prior genetic testing collected at the Broad Institute Center for Mendelian Genomics of th...
Show abstract
BackgroundRNA-sequencing is increasingly being used as a complementary tool to DNA sequencing in diagnostics where DNA analysis has been uninformative. RNA-sequencing allows us to identify alternative splicing and aberrant gene expression allowing for improved interpretation of variants of unknown significance (VUS). Additionally, RNA-sequencing provides the opportunity not only to look at the splicing effects of known VUSs but also to scan the transcriptome for abnormal splicing events and expr...
Show abstract
Whole genome sequencing (WGS) is championed by the UK National Health Service (NHS) to identify genetic variants that cause particular diseases. The full potential of WGS has yet to be realised as early data analytic steps prioritise protein-coding genes, and effectively ignore the less well annotated non-coding genome which is rich in transcribed and critical regulatory regions. To address, we developed a filter, which we call GROFFFY, and validated in WGS data from hereditary haemorrhagic tela...
Show abstract
By lack of functional evidence, genome-based diagnostic rates cap at approximately 50% across diverse Mendelian diseases. Here, we demonstrate the effectiveness of combining genomics, transcriptomics, and, for the first time, proteomics and phenotypic descriptors, in a systematic diagnostic approach to discover the genetic cause of mitochondrial diseases. On fibroblast cell lines from 145 individuals, tandem mass tag labelled proteomics detected approximately 8,000 proteins per sample and covere...
Show abstract
Exome sequencing is now mainstream in clinical practice, however, identification of pathogenic Mendelian variants remains time consuming, partly because limited accuracy of current computational prediction methods leaves much manual classification. Here we introduce CAPICE, a new machine-learning based method for prioritizing pathogenic variants, including SNVs and short InDels, that outperforms best general (CADD, GAVIN) and consequence-type-specific (REVEL, ClinPred) computational prediction m...
Show abstract
Whole genome sequencing (WGS) has the potential to outperform clinical microarrays for the detection of structural variants (SV) including copy number variants (CNVs), but has been challenged by high false positive rates. Here we present ClinSV, a WGS based SV integration, annotation, prioritisation and visualisation method, which identified 99.8% of pathogenic ClinVar CNVs >10kb and 11/11 pathogenic variants from matched microarrays. The false positive rate was low (1.5-4.5%) and reproducibilit...
Show abstract
Visualization of genes and genetic variants as well as transcript structure is essential within the human genetics community. Such illustrations represent a key tool in communicating genetic concepts and facilitating discussions on therapeutic interventions. There currently are no easily usable tools which allows the users to draw all features required for a comprehensive overview of a transcripts structure and the localisation of variants of interest. Here we introduce ExonViz, an online appl...
Show abstract
Parents of children with genetic disorders due to de novo variants are counselled on a recurrence risk estimate of 1-5% for further affected siblings, while the actual probability varies between 0 and 50%. This discrepancy is well known, but barely investigated. We enrolled 135 families, in which a child had been previously identified with a pathogenic seemingly de novo variant (in 140 genes). Covering two germ layers, we collected blood (n=269), buccal (n=223) and nail samples (n=223) of both p...
Show abstract
While copy number variants (CNVs) have been identified as an important cause of rare genetic disorders, they have also been identified in unaffected control populations, making clinical interpretation of these lesions challenging. Discriminating benign CNVs from those pathogenic for rare genetic disorders, therefore, relies on understanding what regions of the human genome are tolerant to copy number variation. Benign-Ex is a python-based program that uses information from databases of CNVs to g...
Show abstract
Short-read sequencing (SRS) methods have improved the detection of small genetic variants but remain limited in highly homologous genomic regions, such as segmental duplications with gene-pseudogene pairs. These paralogous regions often require complex, locus-specific assays for accurate analysis. Long-read genome sequencing (lrGS) technologies, such as PacBio HiFi sequencing, can span these regions but still face challenges in variant calling due to alignment ambiguities. Here, we evaluated Pac...
Show abstract
BackgroundEnterococcus faecium is a commensal of the gastrointestinal tract of animals and humans but also a causative agent of hospital-acquired infections. Resistance against glycopeptides and especially to vancomycin, a first-line antibiotic to treat infections with multidrug-resistant Gram-positive pathogens, has motivated the inclusion of E. faecium in the WHO global priority list. Vancomycin resistance can be conferred by the vanA gene cluster on the transposon Tn1546, which is frequently ...
Show abstract
Fabry disease is a rare lysosomal storage condition in which sphingolipid levels build up to harmful levels in various bodily organs, eventually leading to life-threatening complications such as stroke and kidney failure. Fabry disease is caused by rare pathogenic alleles in the GLA gene on chromosome X and may present as an early or late-onset disease depending on the identity of the causal allele and the severity of its effect on the gene product. Epidemiological studies have widely varied in ...
Show abstract
BackgroundWhole-genome sequencing (WGS) projects for rare disease diagnosis typically yield a diagnostic rate of approximately 25-40%, dependent particularly on patient selection and the extent of prior genetic testing. The Scottish Genomes Partnership (SGP) is a collaborative research programme involving four Scottish Regional Genetics Centres, four Scottish Medical Schools, and Genomics Englands 100,000 Genomes Project. It aims to facilitate genome sequencing and diagnosis for patients in the ...
Show abstract
Structural variants (SVs), including large deletions, duplications, inversions, translocations, and complex SVs have the potential to disrupt gene function resulting in rare disease. Nevertheless, current pipelines and clinical decision support systems for exome sequencing (ES) tend to focus on small alterations such as single nucleotide variants (SNVs) and insertions-deletions shorter than 50 base pairs (indels). Additionally, detection and interpretation of large copy-number variants (CNVs) ar...
Show abstract
BackgroundCurrent single nucleotide variants (SNVs) pathogenicity prediction tools assess various properties of genetic variants and provide a likelihood of causing a disease. This information aids in variant prioritization - the process of narrowing down the list of potential pathogenic variants, and, therefore, facilitating diagnostics. Assessing the effectiveness of SNV pathogenicity tools using ClinVar data is a widely adopted practice. Our findings demonstrate that this conventional method ...